MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles

نویسندگان

  • Chi-kiu Lo
  • Dekai Wu
چکیده

We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, nonautomatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacy judgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semiautomated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Machine Translation Utility via Semantic Role Labels

We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key ...

متن کامل

BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

We present experimental results showing that integrating cross-lingual semantic frame similarity into the semantic frame based automatic MT evaluation metric MEANT improves its correlation with human judgment on evaluating translation adequacy. Recent work shows that MEANT more accurately reflects translation adequacy than other automatic MT evaluation metrics such as BLEU or TER, and that more...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

MEANT at WMT 2013: A Tunable, Accurate yet Inexpensive Semantic Frame Based MT Evaluation Metric

The linguistically transparentMEANT and UMEANT metrics are tunable, simple yet highly effective, fully automatic approximation to the human HMEANT MT evaluation metric which measures semantic frame similarity between MT output and reference translations. In this paper, we describe HKUST’s submission to the WMT 2013 metrics evaluation task, MEANT and UMEANT. MEANT is optimized by tuning a small ...

متن کامل

Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars

We introduce an inversion transduction grammar based restructuring of the MEANT automatic semantic frame based MT evaluation metric, which, by leveraging ITG language biases, is able to further improve upon MEANT’s already-high correlation with human adequacy judgments. The new metric, called IMEANT, uses bracketing ITGs to biparse the reference and machine translations, but subject to obeying ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011